207 research outputs found

    Bounding the Error From Reference Set Kernel Maximum Mean Discrepancy

    Full text link
    In this paper, we bound the error induced by using a weighted skeletonization of two data sets for computing a two sample test with kernel maximum mean discrepancy. The error is quantified in terms of the speed in which heat diffuses from those points to the rest of the data, as well as how at the weights on the reference points are, and gives a non-asymptotic, non-probabilistic bound. The result ties into the problem of the eigenvector triple product, which appears in a number of important problems. The error bound also suggests an optimization scheme for choosing the best set of reference points and weights. The method is tested on a several two sample test examples

    On Suprema of Autoconvolutions with an Application to Sidon sets

    Full text link
    Let ff be a nonnegative function supported on (1/4,1/4)(-1/4, 1/4). We show supxRRf(t)f(xt)dt1.28(1/41/4f(x)dx)2, \sup_{x \in \mathbb{R}}{\int_{\mathbb{R}}{f(t)f(x-t)dt}} \geq 1.28\left(\int_{-1/4}^{1/4}{f(x)dx} \right)^2, where 1.28 improves on a series of earlier results. The inequality arises naturally in additive combinatorics in the study of Sidon sets. We derive a relaxation of the problem that reduces to a finite number of cases and yields slightly stronger results. Our approach should be able to prove lower bounds that are arbitrary close to the sharp result. Currently, the bottleneck in our approach is runtime: new ideas might be able to significantly speed up the computation

    Spectral Echolocation via the Wave Embedding

    Full text link
    Spectral embedding uses eigenfunctions of the discrete Laplacian on a weighted graph to obtain coordinates for an embedding of an abstract data set into Euclidean space. We propose a new pre-processing step of first using the eigenfunctions to simulate a low-frequency wave moving over the data and using both position as well as change in time of the wave to obtain a refined metric to which classical methods of dimensionality reduction can then applied. This is motivated by the behavior of waves, symmetries of the wave equation and the hunting technique of bats. It is shown to be effective in practice and also works for other partial differential equations -- the method yields improved results even for the classical heat equation

    Two-sample Statistics Based on Anisotropic Kernels

    Full text link
    The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between nn data points and a set of nRn_R reference points, where nRn_R can be drastically smaller than nn. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as pqn\|p-q\| \sqrt{n} \to \infty , and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions

    Variational Diffusion Autoencoders with Random Walk Sampling

    Full text link
    Variational autoencoders (VAEs) and generative adversarial networks (GANs) enjoy an intuitive connection to manifold learning: in training the decoder/generator is optimized to approximate a homeomorphism between the data distribution and the sampling space. This is a construction that strives to define the data manifold. A major obstacle to VAEs and GANs, however, is choosing a suitable prior that matches the data topology. Well-known consequences of poorly picked priors are posterior and mode collapse. To our knowledge, no existing method sidesteps this user choice. Conversely, diffusion maps\textit{diffusion maps} automatically infer the data topology and enjoy a rigorous connection to manifold learning, but do not scale easily or provide the inverse homeomorphism (i.e. decoder/generator). We propose a method that combines these approaches into a generative model that inherits the asymptotic guarantees of diffusion maps\textit{diffusion maps} while preserving the scalability of deep models. We prove approximation theoretic results for the dimension dependence of our proposed method. Finally, we demonstrate the effectiveness of our method with various real and synthetic datasets.Comment: 24 pages, 9 figures, 1 table; accepted to ECCV 202

    People Mover's Distance: Class level geometry using fast pairwise data adaptive transportation costs

    Full text link
    We address the problem of defining a network graph on a large collection of classes. Each class is comprised of a collection of data points, sampled in a non i.i.d. way, from some unknown underlying distribution. The application we consider in this paper is a large scale high dimensional survey of people living in the US, and the question of how similar or different are the various counties in which these people live. We use a co-clustering diffusion metric to learn the underlying distribution of people, and build an approximate earth mover's distance algorithm using this data adaptive transportation cost

    Bigeometric Organization of Deep Nets

    Full text link
    In this paper, we build an organization of high-dimensional datasets that cannot be cleanly embedded into a low-dimensional representation due to missing entries and a subset of the features being irrelevant to modeling functions of interest. Our algorithm begins by defining coarse neighborhoods of the points and defining an expected empirical function value on these neighborhoods. We then generate new non-linear features with deep net representations tuned to model the approximate function, and re-organize the geometry of the points with respect to the new representation. Finally, the points are locally z-scored to create an intrinsic geometric organization which is independent of the parameters of the deep net, a geometry designed to assure smoothness with respect to the empirical function. We examine this approach on data from the Center for Medicare and Medicaid Services Hospital Quality Initiative, and generate an intrinsic low-dimensional organization of the hospitals that is smooth with respect to an expert driven function of quality

    Deep neural networks adapt to intrinsic dimensionality beyond the target domain

    Full text link
    We study the approximation of two-layer compositions f(x)=g(ϕ(x))f(x) = g(\phi(x)) via deep networks with ReLU activation, where ϕ\phi is a geometrically intuitive, dimensionality reducing feature map. We focus on two intuitive and practically relevant choices for ϕ\phi: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets. We achieve near optimal approximation rates, which depend only on the complexity of the dimensionality reducing map ϕ\phi rather than the ambient dimension. Since ϕ\phi encapsulates all nonlinear features that are material to the function ff, this suggests that deep nets are faithful to an intrinsic dimension governed by ff rather than the complexity of the domain of ff. In particular, the prevalent assumption of approximating functions on low-dimensional manifolds can be significantly relaxed using functions of type f(x)=g(ϕ(x))f(x) = g(\phi(x)) with ϕ\phi representing an orthogonal projection onto the same manifold

    Cautious Active Clustering

    Full text link
    We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space. We study the question of querying the class label at a very small number of judiciously chosen points so as to be able to attach the appropriate class label to every point in the set. Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class. Our technique involves the use of a highly localized kernel constructed from Hermite polynomials, in order to create a hierarchical estimate of the supports of the constituent probability measures. We do not need to make any assumptions on the nature of any of the probability measures nor know in advance the number of classes involved. We give theoretical guarantees measured by the FF-score for our classification scheme. Examples include classification in hyper-spectral images and MNIST classification

    On the Dual Geometry of Laplacian Eigenfunctions

    Full text link
    We discuss the geometry of Laplacian eigenfunctions Δϕ=λϕ-\Delta \phi = \lambda \phi on compact manifolds (M,g)(M,g) and combinatorial graphs G=(V,E)G=(V,E). The 'dual' geometry of Laplacian eigenfunctions is well understood on Td\mathbb{T}^d (identified with Zd\mathbb{Z}^d) and Rn\mathbb{R}^n (which is self-dual). The dual geometry is of tremendous role in various fields of pure and applied mathematics. The purpose of our paper is to point out a notion of similarity between eigenfunctions that allows to reconstruct that geometry. Our measure of 'similarity' α(ϕλ,ϕμ) \alpha(\phi_{\lambda}, \phi_{\mu}) between eigenfunctions ϕλ\phi_{\lambda} and ϕμ\phi_{\mu} is given by a global average of local correlations α(ϕλ,ϕμ)2=ϕλϕμL22M(Mp(t,x,y)(ϕλ(y)ϕλ(x))(ϕμ(y)ϕμ(x))dy)2dx, \alpha(\phi_{\lambda}, \phi_{\mu})^2 = \| \phi_{\lambda} \phi_{\mu} \|_{L^2}^{-2}\int_{M}{ \left( \int_{M}{ p(t,x,y)( \phi_{\lambda}(y) - \phi_{\lambda}(x))( \phi_{\mu}(y) - \phi_{\mu}(x)) dy} \right)^2 dx}, where p(t,x,y)p(t,x,y) is the classical heat kernel and etλ+etμ=1e^{-t \lambda} + e^{-t \mu} = 1. This notion recovers all classical notions of duality but is equally applicable to other (rough) geometries and graphs; many numerical examples in different continuous and discrete settings illustrate the result
    corecore